Detecting Moving Objects in DAS Recordings

Authors:

Krzysztof Skrobała 156039
Wojciech Bogacz 156034

Introduction

In this project, we analyzed Distributed Acoustic Sensing (DAS) data captured as a 2D numpy matrix, representing strain rates along a fiber optic cable installed on Jana Pawła II street. The data reflects the vibrations caused by passing trams, trucks, and cars. Using image processing and signal analysis techniques, we identified vehicle, tracked their movement, and estimated their velocities, providing valuable insights into traffic patterns on this busy street.

The task is very difficult, therefore we tested 2 approches. None of these is the best, however, some scenarios are better in one case and some in other.

On the form of the report

First we will introduce the data and methods on a single example. Then we will present the results on the whole dataset.

Example data

Example data file:

The x-axis represent the spatial position on the cable, while the y-axis represents a given timestamp. The color intensity represents the strain rate at a given point in time and space. The data is very noisy and contains multiple moving objects, which we aim to detect and track.

Note that the data shape is python data.shape and the display streches the image horizontaly

Preprocessing

preprocessing_flow_chart
preprocessing_flow_chart

example_filename = "090332.npy"

img = load_from_file(example_filename).to_numpy()

To reduce noise and magnify the data we tried many methods in various combinations. The most fulfilling approach consists of the following steps:

  1. Absolute value - The values can be negative, but we are interested only in the magnitude of the signal, so we can take the absolute value
img = np.abs(img)
  1. Clip to the 99% percentile of intensity distribution - As we can see from the plot, data contains outliers, while most of the data lies in a one region
plt.title("Data distribution before removing outliers")
plt.hist(img.flatten(), bins=100)
plt.show()

high = np.percentile(img, 99)
img = np.minimum(img, high)

plt.title("Data distribution after removing outliers")
plt.hist(img.flatten(), bins=100)
plt.show()

  1. Standardize the data in range 0-255 - To normalize the data for consistent comparison and processing.
img = np.around(255 * (img - np.min(img)) / (np.max(img) - np.min(img))).astype(
    np.uint8
)
plot_numpy(img, title="Standarized data")

  1. Non-local Means Denoising algorithm - We implement fastNlMeansDenoising with agressive parameters to reduce noise
img = cv2.fastNlMeansDenoising(img, templateWindowSize=7, searchWindowSize=21, h=14)
plot_numpy(img, title="Non-local Means Denoising")

  1. blur - We blur the image with a kernel with the greater vertical size, as the data is very narrow in the horizontal dimension.
img = cv2.blur(img, (3, 41))
plot_numpy(img, title="Blurred")

  1. Erosion - morphological operation - As various signals tend to be close to each other, and we don’t need much of a aconnectivity as well as edges details, we perform an erosion operation.
img = cv2.morphologyEx(
    img,
    cv2.MORPH_ERODE,
    cv2.getStructuringElement(cv2.MORPH_RECT, (3, 9)),
    iterations=1,
)
plot_numpy(img, title="Erosion")

  1. Binary conversion - Now we can perform thresholding segmentation to get a binary image.
_, img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
plot_numpy(img, title="Binary conversion")

Main idea

The main idea behind the two approaches is to detect lines in the data, assuming contant speed of the vehicles

Then, the speed can be calculated from the formula:

def velocity_from_slope(slope: float) -> float:
    return round(abs(1 / slope) * DX / DT, 2)

The slope is inverted, because we use the x-axis as the spatial dimension and the y-axis as the time dimension. The velocity is calculated in meters per second.

FIRST APPROACH

The idea of our first approach was to cluster the values from the different data sources. Then we would apply linear regression within the clusters, on the (x,y) cooridinates of non-zero values. The slope of the regression line would give us the velocity of the vehicle.

method_flow_chart
method_flow_chart

1. Clustering

Because of the nature of the data and the fact that we want to cluster datapoints by density rather than closeness (as in for example KMeans) we chose the DBSCAN algorithm. It does not require specifying the number of clusters in advance, making it a versatile choice for exploratory data analysis.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together points that are closely packed together, marking as outliers points that lie alone in low-density regions. It requires two parameters: eps (the maximum distance between two points to be considered neighbors) and min_samples (the minimum number of points required to form a dense region). DBSCAN is particularly effective for discovering clusters of arbitrary shape and handling noise in the data.

X = np.nonzero(img)
X = np.vstack(X).T

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

clustering = DBSCAN(eps=0.08, min_samples=300).fit(X_scaled)

no_of_clusters = int(np.max(clustering.labels_) + 1)

The algorithm detected python no_of_clusters clusters in this exemplary data.

colors = generate_colors(no_of_clusters)

colored_clusters = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
for cluster_id in range(no_of_clusters):
    # Indices of the points in the cluster
    idx = X[clustering.labels_ == cluster_id, :]

    colored_clusters[idx.T[0], idx.T[1], :] = colors[cluster_id]

plot_numpy(colored_clusters, title="Clustered data")

2. Linear Regression

Once we have separated clusters, we can apply linear regression to each of them to estimate the velocity of the moving object.

As some clusters are separated noise, we discard these with R2 < 0.3.

lines = []
velocities = []
for cluster_id in range(no_of_clusters):
    idx = X[clustering.labels_ == cluster_id, :]
    x_coords = idx[:, 1].reshape(-1, 1)
    y_coords = idx[:, 0]

    model = LinearRegression()
    model.fit(x_coords, y_coords)

    slope = model.coef_[0]
    intercept = model.intercept_

    x_start = np.min(x_coords)
    x_end = np.max(x_coords)

    r2 = model.score(x_coords, y_coords)
    print(f"R2: {r2}")

    if r2 < 0.3:
        print(f"Discarding object with R2 < 0.3 ({r2})")
        continue

    velocity = velocity_from_slope(slope)
    print(f"Detected velocity: {velocity} m/s ({''})")
    velocities.append(velocity)

    lines.append((slope, intercept, x_start, x_end))
## LinearRegression()
## R2: 0.92240659229734
## Detected velocity: 27.13 m/s ()
## LinearRegression()
## R2: 0.8707396109109362
## Detected velocity: 18.26 m/s ()
## LinearRegression()
## R2: 0.0029409018431784117
## Discarding object with R2 < 0.3 (0.0029409018431784117)
plot_numpy_with_lines(img, lines, title="Detected velocities")

SECOND APPROACH

In the second method, we approached the issue from a different angle. First, we generated lines using Houghline and then, we clusterd them.

method_2_flow_chart
method_2_flow_chart

1. Hough Lines

The Hough Line Transform is a feature extraction technique used in image analysis and computer vision to detect straight lines in an image. It works by transforming points in the image space to the parameter space, where each point in the image space corresponds to a sinusoidal curve in the parameter space. Lines in the image space are identified by finding intersections of these curves in the parameter space. This method is robust to noise and can detect lines even if they are broken or partially obscured. It is commonly used in applications like lane detection in autonomous driving and shape analysis.

To cluster lines we used the same algorithm like in the first method - DBSCAN.

Implementation

Because our data is very narrow (python w) and the lines are visible on the streched image, it makes sense to resize the date, so the HoughLines algorithm has more freedom in term of slope selection

aspect_ratio = 12 / 16
h, w = img.shape

new_w = int(aspect_ratio * h)

img = cv2.resize(img, (new_w, h), interpolation=cv2.INTER_NEAREST)

plot_numpy(img)

Now we can apply HoughLines algoritm, which will detect many lines going through the dense regions


img_before = img.copy()

lines = cv2.HoughLinesP(
    img,  # Input edge image
    1,  # Distance resolution in pixels
    np.pi / 180,  # Angle resolution in radians
    threshold=500,  # Min number of votes for valid line
    minLineLength=1500,  # Min allowed length of line
    maxLineGap=400,  # Max allowed gap between line for joining them
)

Because some of the lines may be perfectly vertical, and we are interested in the slopes which for them is an infinity we erase them

valid_lines = lines[~(lines[:, :, 0] == lines[:, :, 2]), :]

Now we can plot the lines

img_lines = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
color_palette = generate_colors(len(valid_lines))
for i in range(len(valid_lines)):
    _ = cv2.line(
        img_lines,
        (valid_lines[i, 0], valid_lines[i, 1]),
        (valid_lines[i, 2], valid_lines[i, 3]),
        [int(c) for c in color_palette[i]],
        1,
        cv2.LINE_AA,
    )
plot_numpy(img_lines)

2. Clustering

We now have to separate the line clusters and aggregate them.

We tried using (slope, intercept) and (center_x, center_y) combinations

valid_lines_directional = np.apply_along_axis(
    lambda x: np.array([*get_slope_and_intercept(*x)]), axis=1, arr=valid_lines
)

valid_lines_centers = np.apply_along_axis(
    lambda x: np.array([(x[0] + x[2]) / 2, (x[1] + x[3]) / 2]),
    axis=1,
    arr=valid_lines,
)

Normalizing the data

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(valid_lines_centers)

clustering = DBSCAN(eps=0.15, min_samples=16).fit(X_scaled)


no_of_clusters = np.max(clustering.labels_) + 1

The DBSCAN algorithm has in-built outliers detection, so we ommit lines marked as cluster -1 (python sklearn implementation)

img_clusters = cv2.cvtColor(img_before, cv2.COLOR_GRAY2RGB)
color_palette = generate_colors(no_of_clusters)
line_colors = np.apply_along_axis(lambda x: color_palette[x],  0, clustering.labels_)
line_colors[ clustering.labels_ == -1 ] = [0,0,255]

for i in range(len(valid_lines)):
    _ = cv2.line(
        img_clusters,
        (valid_lines[i, 0], valid_lines[i, 1]),
        (valid_lines[i, 2], valid_lines[i, 3]),
        [int(c) for c in line_colors[i]],
        1,
        cv2.LINE_AA,
    )

plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=clustering.labels_)
## <matplotlib.collections.PathCollection object at 0x767381120e90>
plt.show()

plot_numpy(img_clusters)

average_lines = []
velocities = []
for cluster_id in range(no_of_clusters):
    average_line = np.mean(valid_lines_directional[clustering.labels_ == cluster_id], axis=0)
    x_min = 0
    x_max = new_w
    average_lines.append([*average_line, x_min, x_max])

    slope = average_line[0]
    velocities.append(velocity_from_slope(slope))

img_average_lines = cv2.cvtColor(img_before, cv2.COLOR_GRAY2RGB)

plot_numpy_with_lines(img_average_lines, average_lines)

Results on the whole dataset

import hough_pipeline
import regression_pipeline
from image_processing import preprocess, initial_preprocess
from data import load_all_files

i = 0
for file in load_all_files():
    i+=1
    original = initial_preprocess(file.to_numpy())
    if i == 4:
        break
    
    img = preprocess(file.to_numpy())
    plot_numpy(original, title="original")
    hough_pipeline.detect_velocities(img, original)
    regression_pipeline.detect_velocities(img,  original)
## <class 'numpy.ndarray'>
## Detected velocity: 20.46294858118199 m/s
## Detected velocity: 316.7668700661404 m/s
## Detected velocity: 30.34 m/s)
## Detected velocity: 44.48 m/s)
## <class 'numpy.ndarray'>
## Detected velocity: 16.859782376786857 m/s
## Detected velocity: 22.827744826114788 m/s
## Detected velocity: 27.13 m/s)
## Detected velocity: 18.26 m/s)
## <class 'numpy.ndarray'>
## Detected velocity: 35.36110945167485 m/s
## Detected velocity: 26.74 m/s)